Overcoming model bias for robust offline deep reinforcement learning

نویسندگان

چکیده

State-of-the-art reinforcement learning algorithms mostly rely on being allowed to directly interact with their environment collect millions of observations. This makes it hard transfer success industrial control problems, where simulations are often very costly or do not exist, and exploring in the real can potentially lead catastrophic events. Recently developed, model-free, offline RL algorithms, learn from a single dataset (containing limited exploration) by mitigating extrapolation error value functions. However, robustness training process is still comparatively low, problem known methods using To improve stability process, we use dynamics models assess policy performance instead functions, resulting MOOSE (MOdel-based Offline Search Ensembles), an algorithm which ensures low model bias keeping within support data. We compare state-of-the-art BRAC, BEAR BCQ Industrial Benchmark MuJoCo continuous tasks terms robust performance, find that outperforms its model-free counterparts almost all considered cases, even far.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Deep Reinforcement Learning with Adversarial Attacks

This paper proposes adversarial attacks for Reinforcement Learning (RL) and then improves the robustness of Deep Reinforcement Learning algorithms (DRL) to parameter uncertainties with the help of these attacks. We show that even a naively engineered attack successfully degrades the performance of DRL algorithm. We further improve the attack using gradient information of an engineered loss func...

متن کامل

Robust Zero-Sum Deep Reinforcement Learning

This paper presents a methodology for evaluating the sensitivity of deep reinforcement learning policies. This is important when agents are trained in a simulated environment and there is a need to quantify the sensitivity of such policies before exposing agents to the real world where it is hazardous to employ RL policies. In addition, we provide a framework, inspired by H∞ control theory, for...

متن کامل

Deep Reinforcement Learning for 2048

In this paper, we explore the performance of a Reinforcement Learning algorithm using a Policy Neural Network to play the popular game 2048. After proposing a modelization of the state and action spaces, we review our learning process, and train a first model without incorporating any prior knwoledge of the game. We prove that a simple Probabilistic Policy Network achieves a 4 times higher maxi...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Offline Evaluation of Online Reinforcement Learning Algorithms

In many real-world reinforcement learning problems, we have access to an existing dataset and would like to use it to evaluate various learning approaches. Typically, one would prefer not to deploy a fixed policy, but rather an algorithm that learns to improve its behavior as it gains more experience. Therefore, we seek to evaluate how a proposed algorithm learns in our environment, meaning we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Engineering Applications of Artificial Intelligence

سال: 2021

ISSN: ['1873-6769', '0952-1976']

DOI: https://doi.org/10.1016/j.engappai.2021.104366